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This article presents a hub-based approach to community finding in complex networks. After 
identifying the network nodes with highest degree (the so-called hubs), the network is flooded with 
wavefronts of labels emanating from the hubs, accounting for the identification of the involved 
communities. The simplicity and potential of this method, which is presented for direct/undirected 
and weighted/unweighted networks, is illustrated with respect to the Zachary karate club data, 
image segmentation, and concept association. Attention is also given to the identification of the 
boundaries between communities. 

PACS numbers: 89.75.Hc, 89.75.Kd, 89.75.-k, 85.57.Nk 



The problem of community finding in complex net- 
works represents one of the most challenging and 
promising perspectives from which to approach, charac- 
terize and understand those general structures. Related 
to established areas in graph theory (e.g. 0) and pattern 
recognition (e.g. 0.0 ), the interest in community finding 
in complex networks was fostered by sociological studies 
(e.g. 0) and further enhanced by the seminal articles by 
Wu and Huberman 8] and Newman and Girvan 9] . The 
latter defined the problem of community finding as 'the 
division of network nodes into groups within which the 
network connections are dense, but between which are 
sparse'. That work also proposed a divisive methodology 
based on the concept of shortest path betweeness which 
has become the main reference for community finding 
investigations given its good performance, despite its rel- 
atively high computational demand. Other approaches 
include the method based on an analogy with electrical 
circui ts M , consideration of triangular loops in the net- 
work |l0||. application of super-paramagnetic clustering 
RlJ, analysis of the spectral properties of the networks 
[12J j and spectral properties of the Laplacian matrix com- 
bined with clustering techniques 01 . Despite thegrow- 
ing attention focused on this issue — see 0, 0, 0, 
for a good characterization of the problem and extensive 
related references — some important points remain not 
completely solved, including the definition of a commu- 
nity and the high computational demand implied by the 
most effective techniques. 

The current work describes an alternative approach to 
community finding which is based on one of the most 
characteristic concepts underlying the new area of com- 
plex networks, namely that of a hub, i.e. a node in a 
network exhibiting high degree. As emphasized by the 
several investigations targeting complex networks, such 
nodes play determinant role in defining the connectivity 
patterns in several natural structures and systems 
Therefore, the consideration of hubs as starting points 
for network partition represents a particularly promis- 
ing perspective from which to approach the community 
finding problem, a possibility which was preliminary con- 



sidered in [hJ • The current article reports on a simple 
and powerful hub-based community finding methodology 
which involves the flooding of the network with labels 
emanating with constant speed from the respective hubs. 
Such a procedure, which is related to the concept of dis- 
tance transform [g, Il5| in graphs j^l and label propaga- 
tion in orthogonal lattices [rj ITU Il7j. provides a simple 
and natural means for partitioning networks, especially 
those organized around hubs, into coherent communities. 
Such a methodology, as well as a post-processing step al- 
lowing integration of border elements, is presented and 
illustrated in the following with respect to three repre- 
sentative weighted/unweighted and directed/undirected 
networks, namely the well-known Zachary karate club, 
image segmentation, and concept associations. 

The network under analysis is assumed to have N 
nodes, labeled as i = 1,2, . . . , N, and n edges repre- 
sented as which can have unit or general weight 
Wij represented as w(j, i) in the respective weight ma- 
trix. The outdegree Oi of a specific node i is herein de- 
fined as the sum of the weights of the emerging edges, 
i.e. Oi = Ysk=i w (hk)> while the indegree is defined as 
I* = Efcli w (^i*)- Observe that undirected networks 
are characterized by Oi = Li for any i. The hubs are 
henceforth understood as the set of M nodes with the 
highest degrees. The d— ball with radius d centered at 
node i is defined as the subgraph containing all nodes 
which are connected to i through shortest paths no longer 
than d. The label of a specific node i can be propagated 
through the network by identifying the d— balls centered 
at i with subsequent distance values d. If such wave- 
fronts are started at each of the M hubs, the respective 
labels are propagated as long as the nodes being reached 
by the wavefronts are empty, i.e. have not been visited 
by another front. In this work, such a label propaga- 
tion is performed so that the labels emanating from the 
hubs with higher degree are propagated first, for the same 
value of d, than those with lower degree. 

The result of such a flooding procedure is to partition 
the original network into M communities, which can also 
be understood as the Voronoi tessellation of the original 
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network |(| 0, 0] . Observe that the above procedure 
implies that those nodes that are at the same distances 
from two hubs are dominated by the hub with the higher 
degree. Such a procedure implies that two (or more) 
hubs a and b with O a > Ob, sharing most connections, 
as is the case with nodes 33 and 34 in the Zachary club 
network (see Figure [Q, may produce different commu- 
nities. In case it is desired to merge such hubs, which 
is an application-dependent decision, the following post- 
processing can be performed. For each node i, identify all 
its emanating direct connections, whose number is rep- 
resented as Ei, and identify the moda (i.e. the most fre- 
quent value) m among the labels of the nodes connected 
to i. In case the ratio Ri given in Equation ^ is larger 
than a pre-specified threshold value T, the node i receives 
the label to. For weighted networks, it is also possible to 
consider the ratio between the sum of weights of the con- 
nected nodes with label equal to the moda value and the 
total sum of emerging edge weights (see Equation |2J. 

Ri = Mi/Ei (1) 

Ei 

R w = «>(M)/X>(M) (2) 

k^moda k—1 

Q = E( e « " a ?) ( 3 ) 

A particularly interesting, and somewhat overlooked, 
feature of a community partition of a complex network is 
the boundaries between the identified communities. The 
boundaries can be defined with respect to nodes or edges. 
In the former case, the boundary of community i can be 
easily identified by looking for each node with label i 
which is linked to at least another node with different la- 
bel. Such a boundary, which is respective to community 
i, is henceforth called node-boundary of i. The edge- 
boundary between two communities i and j corresponds 
to those edges connecting nodes of i to nodes of j (the 
edge direction can be or not observed in the case of di- 
rected networks). 

Although the above described hub-based methodology 
can be immediately applied to unweighted (i.e. weights 
are or 1) or weighted networks, some remarks regard- 
ing computational implementation should be considered. 
For undirected networks, it is more effective to follow 
the subsequent connections defined by the label flooding 
by looking for non-zero entries along the columns of the 
weight matrix and using lists for book-keeping. It can be 
verified that such a processing can be performed in 0(A), 
as the nodes are checked only once during the labeling 
procedure. A possible means to processing weighted net- 
works is to visit each node while identifying the shortest 
path jig to each of the M hubs, taking as result the label 
of the shortest hub. In case two (or more) hubs are found 
at the same shortest path distance, that with the highest 
node degree is selected. The computational cost of find- 
ing the shortest paths between each of the N nodes and 
the M hubs can be optimized by using algorithms such 
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FIG. 1: Hub-based partition of the Zachary karate club net- 
work: the obtained communities are identified by the node 
size. Only node 3 was labeled differently than in the original 
classes. The edge-boundary between the two communities is 
identified by thicker edges. 



as Dijkstra's, which implies 0(NlogN + n) [T^. Effective 
algorithms for distance transformation in graphs [lfij can 
also considered for further enhancing the performance. 

The potential of the above described hub-based com- 
munity finding approach is illustrated in the follow- 
ing with respect to complex networks obtained for the 
Zachary karate club, image segmentation, and concept 
association. In order to rate the quality of the obtained 
communities, we consider the modularity index Q [t| . Let 
the number of nodes and edges completely contained in- 
side community i be denoted by iVj and rij, respectively, 
and the number of edges with at least one vertex con- 
nected to i be represented as Ai. The modularity index 
Q can now be defined by Equation where en = riijn 
and fflj = (2m + A,-)/(2n). Observe that Q < 1, reaching 
null value for a random partition of the network 0. 

We consider the Zachary karate club data first. The 
network obtained from this dataset is often considered 
as a benchmark for community finding methodologies 
0, Observe that this network is unweighted (i.e. 

unit weights) and undirected. Figure ^ shows the com- 
munities obtained by the hub-based algorithm (small 
and large nodes) considering M = 2, followed by the 
above described node merging post-processing consider- 
ing T = 0.4. The edge-boundary between the two com- 
munities is identified by thicker edges. Actually, the only 
node misclassified by the methodology (node 3), lies at 
the boundary between the two communities and present 
the same number of links with both of them. The quality 
of such a partition, which is precisely the same as that 
obtained in 9], is characterized by Q = 0.36. 

Now we draw attention to the simple image in Fig- 
ure ffia). which contains a floppy-disk, a coin and a pen- 
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cil. The objective here is to segment the image into rea- 
sonable regions of interest, namely the three objects 
As in each image pixel is understood as a node, 

and the absolute difference between the gray-levels at 
any two pixels i and j is taken as the respective weight 
w(i,j). Therefore, two pixels with similar gray-level are 
connected by an edge with small weight, which can be 
understood as the similarity between those pixels |l9j . 
Unlike in 01 j such a fully connected graph is not thresh- 
olded, therefore avoiding one adjustable parameter, and 
the identification of the hubs is not performed sequen- 
tially along the processing, but as its first step. As such, 
the obtained network is weighted and undirected (the 
difference between pixels is symmetric). It should be 
observed that the consideration of image segmentation 
as a community finding benchmark is particularly inter- 
esting, not only because of the easy visualization of the 
obtained results therefore afforded, but also for the pos- 
sibility to immediately check the coherence and quality 
of the obtained communities, which should correspond 
to the main regions in the original image. In order to 
quantify the quality of the obtained partitions, the tem- 
plate image in Figure [21 is considered as the reference for 
the correct classes. Such a template was obtained by a 
human operator by considering the original, higher reso- 
lution, image from which the image in (a) was derived by 
subsampling [6j. The results obtained by the hub-based 
approach considering M = 2, shown in Figure |2Ic), can 
be found to be in good agreement with the template in 
(b). It should be observed that, as several hubs are ob- 
tained for the same region as a consequence of the weight- 
assignment procedure (which produces a fully-connected 
graph as a result), the two hubs were sampled manually 
from each of the two regions. The obtained value of Q for 
such partitioning was found to be equal to 0.007, which 
is so low because of the several original connections be- 
tween the two classes implied by the procedure adopted 
in order to obtain the weight matrix, which is fully con- 
nected. 

Finally, we consider the concept association experi- 
ment reported in [2(j (see also 0|), which involved word 
associations by a human subject. A weighted, directed 
network is obtained by considering each distinct word 
as a node, while the weight of the edge between node 
i and a node j corresponds to the number of times the 
word associated to i was followed by that associated to 
j. The hub-based community finding algorithm was ap- 
plied with M = 10 and T = 0.4. Table[Qshows five of the 
principal hub-words and some of the words falling on the 
respectively defined communities, which include directly 
(shown in italics) and indirectly associated words. The 
word 'fast', for instance, was included into the commu- 
nity dominated by the hub animal through the following 
stream of associations animal i— > butterfly i— > wing i— > air- 
plane i — > fast. The values of Q for M = 2 to 50 with and 
without the node-merging scheme is shown in Figure 
It is clear from this curve that such post-processing is 
highly effective in increasing the quality of the network 
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FIG. 2: Original image (a), reference template obtained by 
manual segmentation (b), and respective hub-based segmen- 
tation into two regions of interest obtained by adopting M — 2 
(c). 



partitioning. 

All in all, the prospects of using the network hubs 
as references for finding communities along the net- 
work, which can be obtained through label propaga- 
tion, has been found to provide a natural and pow- 
erful means for partitioning complex networks, espe- 
cially those organized around hubs (e.g. scale-free net- 
works) into coherent subgraphs. The potential of the 
reported approach has been fully substantiated with re- 
spect to three case-examples of weighted/unweighted and 
directed/undirected networks. Given its low compu- 
tational demand (order N), this methodology presents 
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FIG. 3: The values of the modularity Q for M = 2 to 50 
without (solid line) and with node-merging post-processing 
with T = 0.4 (dotted line). 



sun (18) 


drink (15) 


cold (15) 


way (15) 


animal (15) 


pyramid 


soft 


water- 


easy 


cat 


round 


eat 


sky 


rough 


horse 


yellow 


well 


wool 


good 


brown 


circle 


much 


air 


one 


butterfly 


hot 


few 


pullover 


brief 


wing 


triangle 




sheep 


fine 


airplane 


drawing 




thin 


single 


fast 



TABLE I: Five of the hubs with highest degree (indicated 
within parenthesis) and some of the related concepts included 
in the respective communities. Directly associated concetps 
are shown in italics. 



good potential for several applications in complex net- 
work research. Future works may target further valida- 
tion of the methodology and the consideration of other 
propagating schemes, such as starting the label flooding 
from the nodes with the lowest degree (the end-vertices, 
with unit degree). It would be possible to assign com- 
munities to the groups of nodes furthest away from the 
end-vertices, and compare such communities with those 
induced by the hubs. In addition, other special nodes or 
subgraphs can be considered as starting points for the 
flooding, including specific paths and cycles. The lat- 
ter possibility is particularly promising for analysing net- 
works grown around basic cycles, such as the metabolic 
networks. A particularly interesting perspective is to ex- 
plore the use of properties of the obtained node- and 
edge-boundaries, such as the number of edges/nodes re- 
spectively involved, in order to quantify the quality of 
the obtained results. For instance, a small border be- 
tween two regions with similar number of nodes can be 
taken as an indication of high-quality community find- 
ing. Another issue to be pursued further is to identify 
which community finding algorithms arc more suitable 
with respect to the several types of complex networks. 
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